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ABSTRACT 

In semantic equalisation, descriptions of audio transforma- 
tions can be used to control low-level audio effect parame- 
ters. In this paper, we explore sub-representations of these 
descriptions in order to suggest more contextually relevant 
processing parameters to users, based on external influence. 
We propose a methodology for finding sub-representations, 
and an intuitive low-dimensional interface, which can be 
used to recommend equalisation curves based on proxim- 
ity to cluster centroids. 

1. OVERVIEW 

Semantically-informed equalisation involves learning some 
relationship between a subjective description of musical tim- 
bre and a set of audio features, such that a complex param- 
eter space can be controlled using an intuitive low-dimen- 
sional interface. A common way to do this is to collect 
descriptive metadata from audio engineers applying equali- 
sation to a range of audio signals, then to create a map be- 
tween the two spaces through a process of abstraction. Re- 
cently, this has been implemented through the use of dimen- 
sionality reduction applied to a parametric equaliser (EQ) 
[1], or via multiple regression applied to bands of a graphic 
EQ @. 

In music production, equalisation has a common vocab- 
ulary, in which some of the more frequently used descriptors 
include warm, bright, air, crisp and thin |3|. Whilst these 
terms provide meaningful representations of EQ parameter 
spaces, there is often high within-term variance in the data, 
suggesting disagreement amongst participants. 

In this paper, we attempt to explain this variance by 
identifying sub-representations of semantic descriptors. 
These are clusters of data points, that can be explained by 
some external influence. We propose that by identifying the 
sub-representation a user is trying to achieve, we can pro- 
vide a more contextually relevant parameter space for the 
descriptor they are working with. This is done through an 
intuitive predictive interface that can be navigated in two 
dimensions. 

2. METHODOLOGY 

To identify descriptor sub-representations, we apply cluster- 
ing to a dataset of annotated equalisation settings. Clusters 
are found within each individual term, for settings described 
as both warm and bright. We then measure the saliency of 


the clusters using a number of metrics and develop an inter- 
face for cluster navigation. 

2.1. Dataset 

The dataset used for the experiment is taken from the SAFE 
EQ 0, in which 582 entries were labelled as warm and 531 
entries were labelled as bright. The annotated settings were 
collected from audio effects plugins that operate within a 
digital audio workstation, from a corpus of anonymous users. 
For each entry into the dataset, the setting has a string of 
semantic descriptors, a feature vector containing over 100 
audio features per frame extracted before and after process- 
ing, and a parameter space vector that describes the gain, 
bandwidth and centre frequency for each biquad filter (lx 
lo-shelf, 3x peak, lx hi-shelf) in the EQ. 

2.2. Clustering 

To find sub-representations we first apply dimensionality re- 
duction to the parameter space using a stacked autoencoder, 
allowing us to reduce the 13-dimensional set of EQ controls 
to a navigable 2-dimensional space. K-means is then ap- 
plied to the low-dimensional space in order to find clusters 
of entries. The optimal number of clusters is set to 3 for 
both warm and bright by maximising the group separability 
using a silhouette score. 

To ensure the data is capable of forming reliable parti- 
tions, we measure the Hopkins Statistic of clustering ten- 
dency, which estimates the likelihood of data points be- 
ing sampled from a non-uniform distribution by comparing 
them with randomly sampled values from the low-dimensional 
space. Once k-means has been applied, we then measure the 
saliency of the sub-representations using two metrics. Ideal 
Correlation ( IC ): measures the coherence between a sim- 
ilarity matrix of the points in the dataset with a matrix of 
binary values, where cells are set to 1 if points are from the 
same cluster, and 0 if not. Average Silhouette Score (AS): 
measures the compactness and isolation of clusters by using 
cohesion (c) and separation (s), where: 

AS=(S i -C i )/max(S i ,C i ) (1) 

To evaluate the influence of external data on the clusters, 
we measure the divergence between audio feature distribu- 
tions within each cluster, before and after audio processing 
has been applied. This is done using Kullback-Leibler Di- 
verge (KLD), where the target distribution (P) is the feature 
set after processing, and the approximation distribution (Q) 
is the input feature set. 
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3. RESULTS 

The data exhibits strong clustering tendency, with a Hop- 
kins Statistic of 0.561 (SD: 0.043) for warm and 0.543 (SD: 

0.027) for bright. This suggests the formation of sub-representations 
is plausible given the organisation of settings in reduced- 
dimensionality space. The validity of the sub-representations 
after clustering is shown in Table [T] where strong positive 
results are seen for both descriptors. This suggests the for- 
mation of reliable clusters after k-means has been applied. 



Ideal Correlation 

Silhouette Score 

P 

Warm 

0.6858 

0.4685 

0.5771 

Bright 

0.6148 

0.4302 

0.5224 
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Table 1: Cluster validity metrics for both descriptors after 
k-means 


(a) Descriptor entries mapped into a navigable 2-dimensional space 
with 8 clusters 


KLD is applied to the feature distributions, before and af- 
ter audio processing and the features from each cluster are 
ranked. Only features that were significantly higher ( p > 
.05) than the distribution mean were included. Within the 
warm data. Smoothness (10.01) and Tonality (5.37) were 
salient for cluster 1, and Spectral Flatness (14.62) was salient 
for cluster 3. Within the bright data. Smoothness (12.6) and 
MFCC 9 (12.09) were salient for cluster 1, Spectral Flat- 
ness (16.8) was salient for cluster 2, and MFCC 12 (11.71) 
was salient for cluster 3. This suggests the groups can be 
represented using changes in external feature data. 

4. INTERFACE 

The interface (as shown in Figure [T(a)] i allows users to navi- 
gate the sub-representations, benehtting from recommended 
settings in real-time. Modifications to the equalisation pa- 
rameters can be applied in either low- or high-dimensional 
space, where the relevant sub-representation is found by 
minimising the euclidean distance between the user-input 
and each of the cluster centroids in 2-dimensional space. 

The frequency analyser (Figure [T(b)] > provides feedback 
about (1) the current curve, (2) the boundaries of the cur- 
rent sub-representation and (3) the ideal EQ curve, given 
the current sub-representation. To derive bounding curves, 
all of the points in the 2-dimensional cluster are mapped to 
the parameter space using the decoder layers in the auto- 
encoder, then the minimal and maximal values are selected 
from each parameter. To find the ideal EQ curve, the cen- 
troid of the current cluster is used as the input to the decoder, 
resulting in a 13 -dimensional vector of EQ parameters. 

5, CONCLUSION 

We identify sub-representations in a dataset of semantically 
annotated equalisation data. This is done by applying clus- 
tering in reduced dimensionality space and applying cluster- 
ing tendency measures to measure salience. We evaluate the 
extent to which additional metadata (audio features) cap- 
tured using the SAFE plugin^can describe the clusters and 
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(b) The corresponding equalisation interface 


Figure 1 : The 2-part Interface for cluster navigation 

find that Smoothness, Spectral Flatness and MFCCs score 
particularly highly with a number of clusters. We conclude 
by presenting an interface that allows users to explore the 
sub-representations in a low-dimensional space whilst mak- 
ing recommendations based on cluster centroids. 
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